Building A High Performance Parallel File System Using Grid Datafarm and ROOT I/O

نویسندگان

  • Youhei Morita
  • Hiroyuki Sato
  • Yoshiyuki Watase
  • Osamu Tatebe
  • Satoshi Sekiguchi
  • Satoshi Matsuoka
  • Noriyuki Soda
  • A. Dell'Acqua
چکیده

Sheer amount of petabyte scale data foreseen in the LHC experiments require a careful consideration of the persistency design and the system design in the world-wide distributed computing. Event parallelism of the HENP data analysis enables us to take maximum advantage of the high performance cluster computing and networking when we keep the parallelism both in the data processing phase, in the data management phase, and in the data transfer phase. A modular architecture of FADS/Goofy, a versatile detector simulation framework for Geant4, enables an easy choice of plug-in facilities for persistency technologies such as Objectivity/DB and ROOT I/O. The framework is designed to work naturally with the parallel file system of Grid Datafarm (Gfarm). FADS/Goofy is proven to generate 10 Geant4-simulated Atlas Mockup events using a 512 CPU PC cluster. The data in ROOT I/O files is replicated using Gfarm file system. The histogram information is collected from the distributed ROOT files. During the data replication it has been demonstrated to achieve more than 2.3 Gbps data transfer rate between the PC clusters over seven participating PC clusters in the United States and in Japan.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gfarm V2: a Grid File System That Supports High-performance Distributed and Parallel Data Computing

Grid Datafarm architecture is designed for facilitating reliable file sharing and high-performance distributed and parallel data computing in a Grid across administrative domains by providing a global virtual file system. Gfarm v2 is an attempt to implement a global virtual file system that supports a complete set of standard POSIX APIs, while still retaining the parallel and distributed data c...

متن کامل

Worldwide Fast File Replication on Grid Datafarm

The Grid Datafarm architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. One of features is that it manages file replicas in filesystem metadata for fault tolerance a...

متن کامل

Optimization of Docking Conformations Using Grid Datafarm

Grid Datafarm (GFarm) is a Japanese national project that aims to design an infrastructure for global petascale data intensive computing. GFarm tools and APIs are provided to handle large data files in both single filesystem image and local file views. While the Grid Datafarm is originally motivated by high energy physics applications, it is a generic distributed I/O management and scheduling i...

متن کامل

Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems

Diskless high-performance computing (HPC) systems utilizing networked storage have become popular in the last several years. Removing disk drives significantly increases compute node reliability as they are known to be a major source of failures. Furthermore, networked storage solutions utilizing parallel I/O and replication are able to provide increased scalability and availability. Reducing a...

متن کامل

Performance Evaluation of Software RAID vs. Hardware RAID for Parallel Virtual File System

Linux clusters of commodity computer systems and interconnects have become the fastest growing choice for building cost-effective high-performance parallel computing systems. The Parallel Virtual File System (PVFS) could potentially fulfill the requirements of large I/O-intensive parallel applications. It provides a high-performance parallel file system by striping file data across multiple clu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.DC/0306092  شماره 

صفحات  -

تاریخ انتشار 2003